R News 2007/2

نویسندگان

  • Hemant Ishwaran
  • Udaya B. Kogalur
چکیده

In this article we introduce Random Survival Forests, an ensemble tree method for the analysis of right censored survival data. As is well known, constructing ensembles from base learners, such as trees, can significantly improve learning performance. Recently, Breiman showed that ensemble learning can be further improved by injecting randomization into the base learning process, a method called Random Forests (Breiman, 2001). Random Survival Forests is closely modeled after Breiman’s approach. In Random Forests, randomization is introduced in two forms. First, a randomly drawn bootstrap sample of the data is used for growing the tree. Second, the tree learner is grown by splitting nodes on randomly selected predictors. While at first glance Random Forest might seem an unusual procedure, considerable empirical evidence has shown it to be highly effective. Extensive experimentation, for example, has shown it compares favorably to state of the art ensembles methods such as bagging (Breiman, 1996) and boosting (Schapire et al., 1998). Random Survival Forests being closely patterned after Random Forests naturally inherits many of its good properties. Two features especially worth emphasizing are: (1) It is user-friendly in that only three, fairly robust, parameters need to be set (the number of randomly selected predictors, the number of trees grown in the forest, and the splitting rule to be used). (2) It is highly data adaptive and virtually model assumption free. This last property is especially helpful in survival analysis. Standard analyses often rely on restrictive assumptions such as proportional hazards. Also, with such methods there is always the concern whether associations between predictors and hazards have been modeled appropriately, and whether or not non-linear effects or higher order interactions for predictors should be included. In contrast, such problems are handled seamlessly and automatically within a Random Forests approach. While R currently has a Random Forests package for classification and regression problems (the randomForest() package ported by Andy Liaw and MatthewWiener), there is currently no version available for analyzing survival data1. The need for a Random Forests procedure separate from one that handles classification and regression problems is well motivated as survival data possesses unique features not handled within a CART (Classification and Regression Tree) paradigm. In particular, the notion of what constitutes a good node split for growing a tree, what prediction means, and how to measure prediction performance, pose unique problems in survival analysis. Moreover, while a survival tree can in some instances be reformulated as a classification tree, thereby making it possible to use CART software for a Random Forests analysis, we believe such approaches are merely stop-gap measures that will be difficult for the average user to implement. For example, Ishwaran et al. (2004) show under a proportional hazards assumption that one can grow survival trees using the splitting rule of LeBlanc and Crowley (1992) using the rpart() algorithm (Therneau and Atkinson, 1997), hence making it possible to implement a relative risk forests analysis in R. However, this requires extensive coding on the users part, is limited to proportional hazard settings, and the splitting rule used is only approximate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic transcription for a web 2.0 service to search podcasts

Metadata Title: CNN News Update Description: The latest news happening in the U.S. and around the world. Episode 1 Title: CNN News Update (8-21-2007 7 AM EDT) MP3: http://rss.cnn.com/...08-21-07-7AM.mp3 Episode 2 Title: CNN News Update (8-21-2007 6 AM EDT) MP3: http://rss.cnn.com/...08-21-07-6AM.mp3 Episode 3 Title: CNN News Update (8-21-2007 5 AM EDT) MP3: http://rss.cnn.com/...08-21-07-5AM.mp...

متن کامل

New Functions for Multivariate Analysis

submission and registration will start inDecember 2007.We hope to meet you in Dortmund! The organizing committee:Uwe Ligges, Achim Zeileis, Claus Weihs, Gerd Kopp,Friedrich Leisch, and Torsten [email protected] R NewsISSN 1609-3631 Vol. 7/2, October 200776 Forthcoming Events: R Courses in Munich

متن کامل

Informing a Distracted Audience: News Narratives in Breakfast Television

.................................................................................................................. ii Acknowledgements ................................................................................................. iv Introduction ............................................................................................................. 1 The Case of Breakfast Television ......

متن کامل

Analysis of the Relation Between Stock Price Returns and Headline News Using Text Categorization

In this paper, we analyze about the relation between stock price returns and Headline News. Headline News is very important sources of information in asset management, and is sent in large quantities every day. We study the effect of more than 13,000 Headline News sent from JIJI PRESS. We classify Headline News using Text Categorization and analyze the reaction of a stock price return for every...

متن کامل

An Argument-Based Approach to Cope with Trust and Pluralism in Web News Reports

Due to the huge amount of multi-source news that are available on the Web at any time, it is crucial to provide intelligent mechanisms to select and rank news reports. Over the last few years, a number of approaches based on criteria such as freshness, relevance and viewer profile have been proposed. However, most existing news processing services do not deal with credibility as a subjective no...

متن کامل

Mobile Map Interaction for Local News

From the conceptual perspective, Web 2.0 is about user generated and user centered content. However emerging Web 2.0 news portals, such as [1], ask the users to report about global news and via pre-defined categories. Moreover these portals do not pay attention to the new kind of web-enabled devices (such as smart phones), their abilities and shortcomings. In contrast this paper describes a pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007